BAYESIAN  NONPARAMETRIC  BOOTSTRAP 
CONFIDENCE  INTERVALS 


Nils  Lid  Hjort 


Technical  Report  No.  20 
November  1985 


Laboratory  for 
Computational 
Statistics 


Department  of  Statistics 
Stanford  University 


BAYESIAN  NONPARAMETRIC  BOOTSTRAP  CONFIDENCE  INTERVALS* 


Nils  Lid  Hjort 

Department  of  Statistics,  Stanford  University 

and 

Norwegian  Computing  Centre 

LCS  Technical  Report  No.  20 
and 

Department  of  Statistics  Report  No.  240 
November  1985 
ABSTRACT 


Let  X\ , . . . ,  Xn  be  a  random  sample  from  an  unknown  probability  distribution  P  on  the  sample 
space  X,  and  let  9  =  6(P)  be  a  parameter  of  interest.  The  present  paper  gives  a  “Bayesian 
bootstrap”  method  of  obtaining  Bayes  estimates  and  Bayesian  confidence  limits  for  0,  using 
a  (non-degenerate)  Dirichlet  process  prior  for  P.  This  extends  methods  and  results  of  Rubin 
(1981)  and  Efron  (1982),  in  that  they  assume  the  sample  space  to  be  finite  and  use  only  a 
particular  degenerate  Dirichlet  prior.  An  asymptotic  justification  of  the  Bayesian  bootstrap  is 
given,  parallelling  results  of  Bickei  and  Freedman  (1981). 


*  Work  supported  by  a  National  Science  Foundation  Grant  MCS80-24649,  Office  of  Naval  Research  contract 
N00014-83-K-0472. 


1 .  Exact  Bayesian  intervals. 

Let  X,  ,...,X  be  independent  and  identically  distributed  (i.i.d.)  according  to 
1  n 

an  unknown  distribution  P.  For  convenience  take  the  sample  space  to  be  X  -  n, 
so  that  P  can  be  identified  with  its  distribution  function  (c.d.f.)  F.  Most 
of  the  results  in  this  report  can  be  generalised  to  any  complete,  separable 
metric  space  X . 

Let  0  =  9(F)  be  a  parameter  functional  of  interest.  We  shall  be  concerned 
with  Bayesian  nonparametric  confidence  statements  about  0,  and  need  to  start 
out  with  a  prior  distribution  on  the  space  of  all  c.d.f.s.  A  natural  class  from 
which  to  choose  is  provided  by  Ferguson’s  (1973,  1974)  Dirichlet  processes;  the 
class  is  rich,  each  member  has  large  support,  and  basic  posterior  calculations 
are  feasible.  Thus  let 

F  ~  Dir (aF  ) ,  (1.1) 

o 

i.e.  F  is  a  Dirichlet  process  with  parameter  aFQ.  Fq(.)  =  EgF(.)  is  the  prior 
guess  c.d.f.  whereas  a  >  0  has  interpretation  as  prior  sample  size. 

Identify  the  observed  sample  x^,...,x^  with  the  empirical  c.d.f. 

/N  1  ^ 

F(t)  =  -  E  l{x,  <  th  (1.2) 

n  i=i  1  - 

The  posterior  distribution  of  F  is 

F|F  ~  Dir(aFQ  +  nF) .  (1.3) 

(Ew,  ~  etc.  indicate  statements  relative  to  the  chosen  Bayesian  framework.) 

Thus  the  function 

G(t)  =  PrB{6(F)  £t|F}  (1.4) 

is  in  principle  known.  We  wish  to  calculate  Gyp  fro m  data,  satisfying 

PrB{9LOW  ±  9<F>  i  VlF)  4  1  -  2“'  (1’5) 

say.  Thus 

v-  G'1(a)’  VG'1(1-a)  a-6) 

are  the  natural  choices;  G  ^(p)  =  inf  {t:  G(t)  _>  p}. 
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The  fact  that  G  above  is  only  rarely  explicitly  available,  however, 

necessitates  devising  computational  approximations.  The  problem  becomes  much 

simpler  for  the  particular  case  a  ^  0,  which  is  the  "non-informative"  case 

Rubin  (1981)  and  Efron  (1982,  Ch.  10)  consider.  (Actually,  they  consider 

only  finite  sample  spaces,  but  the  extension  to  the  present  generality  is  easily 

made  via  the  theory  of  Dirichlet  processes.)  Then  F|F  is  concentrated  on  the 

observed  data  values, 

n 

F( . )  =  £  d.  <S(x.)  ,  (1.7) 

l 

i=l 

with  weights  (d^,...,d  )  following  a  Dirichlet  (!,...,!)  (uniform  on  the  simplex 

of  non-negative  weights  summing  to  one).  It  follows  that  values  of  0(F)  can 

be  simulated  according  to  (1.4),  i.e.  G  may  be  closely  approximated  by  Monte 

Carlo.  (The  d.'s  may  be  simulated  as  d.  =  e./(e_  +  .  .  .+e  ),  where  the  e  ’s  are 
l  I  i  1  n  i 

i.i.d.  unit  exponential.  If  0(F)  =  /xdF(x)  is  the  mean,  for  example,  then  a 

n  n  n 

large  number  of  realisations  of  0(F)  =  ~  ^i=l  eiXi^i=l  ei  Can 

generated,  the  histogram  of  these  values  would  approximate  G,  enabling  one  to 
get  good  numerical  approximations  to  the  interval  (1.6).)  Rubin  (1981)  discusses 
this  point  and  notes  that  the  resulting  approach,  though  different  in  interpre¬ 
tation,  agrees  well,  operationally  and  inferentially ,  with  the  ordinary  bootstrap 
procedure.  This  may  be  taken  as  but  another  example  where  Bayesian  inference, 
starting  with  a  non-inf ormative  (’’neutral",  "objective")  prior  distribution, 
resembles  classical  frequentist  inference. 

The  rest  of  the  report  is  concerned  with  the  informative  case  a  >  0. 


2 


2 .  Approximating  the  posterior  distribution  of  9, 


For  a  few  parameter  functionals  the  posterior  distribution  (1.4)  can  be 
evaluated  exactly.  Section  5  provides  calculations  for  9  =  f(a) ,  A  a  set  of 
interest,  and  9  =  F  (p)  ,  the  p-quantile.  In  some  other  instances  can  9(F)  |  F 
be  simulated  directly,  i.e.  a  sequence  Y^,  , . . .  being  i.i.d.  with  a  distri¬ 

bution  equal  to  or  very  close  to  G  can  be  generated,  thus  enabling  one  to 
obtain  a  close  approximation  to  G  and  to  the  sought-after  9^^,  ®up* 

Example.  Let  9  =  /xdF(x)  be  the  unknown  mean  of  F.  The  exact  distribution 
of  9  given  data  can  be  obtained  for  some  choices  of  the  prior  guess  F  ,  but 
the  resulting  expressions  are  complicated,  making  "exact  simulation"  difficult. 
(This  approach  would  have  to  use  results  of  the  type  reached  by  Hannum, 
Hollander,  and  Langberg  (1981)  and  Yamato  (1984),  but  with  Dirichlet  process 

A 

parameter  aFQ  +  nF.)  However,  the  posterior  distribution  of  9  can  be  approxi- 

n  m 

mated  with  that  of  9f  =  x^Fix^}  +  say,  where  A^,...,Am 

is  a  fine  partition  of  R  -  {x-,...,x  },  and  y.  e  A..  91  can  then  be  simulated. 

d  n  2  2  d 

Hjort  (1985)  shows  that  a  -►  a  in  X  implies  Dir(am)  -+  Dir  (a)  in  the 

d 

space  of  probability  measures  on  X9  and  that  /xdFm(x)  /xdF(x)  under  a  mild 

extra  condition  on  {a  }•  This  justifies  9  ~  01  above. 

m 

The  example  illustrates  that  (1.4)  in  general  will  be  difficult  to  obtain 

A 

from  (1.3)  via  direct  simulation  of  0(F) |f.  A  simpler  method,  which  we  call 
the  Bayesian  bootstrap  (BB)  method,  is  however  possible,  and  is  now  described. 
Note  first  that 

fB(t)  =  yr(t)  |r>  -  ^  yt)  +  ^  Set)  (2.1) 

*  * 

is  the  natural  Bayes  estimate  of  F(t).  Generate  a  BB  sample  >***>xn+a+i 

^  * 
from  F_.  (This  is  easy,  provided  it  is  feasible  to  sample  from  F  :  X  is  from 

U  o  i 

F  with  probability  a/ (a  +  n)  and  is  equal  to  x.  with  probability  l/(a  +  n) , 
o  J 

j  =  1 . n.)  Define 
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*  1  nta+1  ,  *  , 

fb  (t)  -  SSi  Jj,  I{xi  -  th 


and  evaluate 

A  ifc  A  A 

0B  ‘  0(FB  >' 

The  proposed  approximation  to  G  is 

Set)  -  Pr.^S,*  <  t), 

which  in  practice  would  have  to  be  evaluated  as 

,  BOOT  ^ 

«*>  4  BOS  I(0B  i  *> 


(2.2) 


(2.3) 


(2.4) 


(2.5) 


*b 

for  a  large  number  BOOT  of  independent  drawings  0B  of  the  type  described. 
The  resulting  confidence  interval 
G-1 (a)  <  9(F)  <  G_1(l-a) 
could  be  termed  the  BB  percentile  interval. 

The  description  above  assumed  a  to  be  an  integer.  If  a  =  m  +  8,  say, 

^  A 

0<  3  <1  and  m  an  integer,  generate  n  +  m  +  2  fs  from  Fg  instead,  and  use 
^  .  -  n+m+1  jl  * 

fb  (t)  -  55T  '  ,E.  1<xi  -  t}  +  6  I{W2  ±  t}1  • 

1=1 

The  motivation  for  the  BB  method  is  as  follows.  The  two  conditional 
distributions  f|f  and  F  *  |  F  are  reasonably  similar.  In  fact,  judicious  ealeu- 

D 


(2.6) 


lations  give 


Eg{F(t) | F}  -  FB(t), 

I?J  '  FB(t)’ 

VarB{F(t)|F}  -  ?B<tJ  <1  -  FB(t)}, 

Var*,B{FB*<t)|F}  *  SST  FB(t)  (1  -  FB(t>K 
Hence,  for  well-behaved  functionals  0  ==  0(F)  we  would  expect 
0(F) |f  ~  0(fb*) |f, 

A 

i,  e.  G  =  G* 


(2.7) 
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As  a  point  of  further  comparison,  it  may  be  considered  a  bit  annoying 
that  the  skewness  of  F(t) |F  is  about  twice  that  of  Fg  |F,  but  they  are  both 
small; 


VF(t)  •  FBCt)}3|F  =  (n+a+1)2(n+a+2y  FB(t)  {1  "  FB(t)}  {1  ‘  2FB(t)K 


J*,B  B 


/N  *  /\  ^  O  I  ^ 

{F,  (t)  -  F„  (t)  }  3  |  F  = 


(n+a+1)  B 


F„(t)  {1  -  F_(t)}  {1  -  2F  (t)}. 


The  next  section  provides  an  asymptotic  justification  for  the  BB. 


Remark  1,  Consider  once  more  the  non-informative  case  a  close  to  zero 
(or,  rather,  a/n  close  to  zero).  Then  the  BB  procedure  advocates  taking 
bootstrap  samples  of  size  n  +  1  from  the  usual  F,  as  opposed  to  the  traditional 
size  n.  This  points  to  the  fact  that  the  BB  sample  size  n+a+1  was  chosen 

.  ^  ^  *  .  /v 

merely  to  make  also  the  second  moments  of  F|F,  F^  |F  agree. 

Remark  2 .  Even  disregarding  the  small  n  versus  n  +  1  controversy,  Rubin’s 
(1981)  "simple  BB"  does  not  come  out  of  letting  a  0  in  the  proposed  BB  of  the 
present  paper.  Rubin’s  method  smooths  the  weights,  but  rigidly  sticks  to  the 
observed  sample  points  (as  does  the  ordinary  bootstrap),  cf.  (1.7),  whereas 
the  more  universally  applicable  method  proposed  here  smooths  also  outside  the 

/V 

data  points,  using  F_. 

D 

One  may  call  this  paper’s  BB  the  informative  Bayesian  bootstrap  and 
Rubin’s  BB  the  non- informative  Bayesian  bootstrap,  in  order  to  distinguish  them. 
The  remarks  above  indicate  that  the  present  informative  version  comes  much 
closer  to  being  a  proper  Bayesian  generalisation  of  Efron’s  bootstrap. 
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3.  Asymptotic  justification. 

Assume  first,  and  mostly  for  illustrational  purposes,  that  the  sample  space  is 

\  ^ 

y  _  r  -I  t  1  t  et  f  =  pr  {X  ~  8  }  ,  f  q  =  $  {x  ,  p 

finite,  say  a  —  11,  •  *  •  *  L*/ .  r  i  £  i  °,x 

(af  +  nf0)/(a  +  n).  Efron  (1982  ,  Ch.  5.6)  observes 
o  ,8  * 

A  (f  -  f)  -  Nl(0,  Ef),  (3-1} 

__  sfc  *  d  N  (3.2) 

AT  (f  -  f)|f  £  Nl(0,  Zf)  a.s., 

where  f*  stems  from  the  ordinary  bootstrap  and  where  Zf  has  elements  fg62m  - 

f  f  ,  and  discusses  why  this  may  be  taken  as  an  asymptotic  justification  for 
8  in 

a  class  of  inferential  procedures  based  on  the  bootstrap.  The  above  results 
rely  only  on  asymptotic  theory  for  the  multinomial  distribution. 


(3.1),  (3.2)  can  now  be  accompanied  by  results  for  the  exact  and  BB 


approximated  posterior  distributions 


f  : 


(n+a+1)^  (f  -  f^)|f  g  ^(0,  ^  a*s#> 

(n+a+l)*5  <fB*  -  £B)|f  VO.  hj  a-S' 


(3.3) 

(3.4) 


An  explanation  is  needed  here:  we  prefer  on  this  occasion  to  study  the 
limiting  behaviour  of  f|f,  ?B*|f  in  the  ordinary  frequentist  framework, 
where  the  observed  frequencies  f,  converge,  on  a  set  ftQ  having  probability 
one,  to  the  true  ones,  say  f true>£ •  The  Parameter  a  may  be  fixed  in  (3.3), 
(3.4),  but  can  also  go  to  infinity  with  n,  as  long  as 


f 


OO 


£  a  £  a.  n  f  f 

^B,®  ”  a+n  ^o,2  a+n  2  °°>® 

will  be  just  ftrue  provided  a/n  -► 
Now  the  framework  for  (3.3),  (3 


on  ft  ; 
o 

0. 

.4)  is  explained.  (3.3)  follows  from 


asymptotic  properties  of  the  Dirichlet  distribution,  whereas  (3.4)  is  essenti¬ 
ally  the  central  limit  theorem.  Note  that  exactly  the  same  a.s.  set  ftQ  is  at 
work  in  (3.2),  (3.3),  (3.4). 

Efron’s  discussion  of  the  consequences  of  (3.1),  (3.2)  (1979,  p.  23; 
1982,  Ch.  5.6)  can  now  be  applied  to  (3.3),  (3.4),  and  provides  the  asymptotic 


justification  for  the  BB  procedure. 
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Remark  3.  It  is  interesting  to  note  that  if  only  a//n  -*■  0,  then  vn  (f?  -  f e ) 


(3.5) 

(3.6) 


-v^n  (f^  -  fB  2)  goes  to  zero,  and 

vk  (f  -  b\fi  MO,  2  )  a-s*’ 

V  true 

)  a's- 

d  true 

Accordingly,  four  different  approaches  will  lead  to  the  same  inferential 

/V 

statements,  up  to  first  order  asymptotics:  the  classical  based  on  f;  the  ordinary 
Efron  bootstrap;  the  proper  posterior  Bayes;  and  the  BB. 


Now  consider  the  extension  of  the  preceding  results  and  conclusions  to 
X  =  R.  The  degree  to  which  (3.1),  (3.2)  and  its  consequences  have  analogues  for 
X  -  R  is  investigated  in  Bickel  and  Freedman  (1981)  and  Singh  (1981)  .  The 
canonical  parallel  to  (3.1)  is 

{F( . )  -  F(. )  }  i  W°{F( . )  }  in  D[-°°,  •)  (3.7) 

where  W°  is  a  Brownian  bridge,  seeifor  example  Billingsley  (1968).  Bickel  and 
Freedman  (1981)  prove  the  bootstrap  companion 

/a  {F*(.)  -  F(.)}|F  *  W°{F(.)}  a. s. ,  (3.8) 

and  conclude  that  the  bootstrap  Works  for  well-behaved  functionals  9'=  0(F). 

These  results  can  be  parallelled  in  the  present  Bayesian  posterior  context. 

A 

Again,  we  look  at  limiting  properties  in  an  ordinary  framework  in  which  F 

according  to  the  Glivenko-Cantelli  theorem  converges  uniformly  to  F  =  Ftrue 

on  a  set  of  probability  one. 
o 

A  A 

Theorem.  Let  a  vary  with  n  in  such  a  way  that  Fg  -  (aFQ  +  nF)/(a  +  n)  ->■ 

some  F_  on  Q  .  (Most  often,  F^  is  just  F  '  ‘O' Then 
oo  o  °°  true 

(n+a+l)3*  {F(.)  -  FB(.)}|F  |  W°{Foo(.)},  (3.9) 

(n+a+l)^  (Fb*(.)  -  Fb(.))|F  W°{Foo(.)},  (3.10) 

along  every  sequence  in 
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Proof:  The  second  statement  is  within  reach  of  (the  triangular  version  of) 
the  classical  invariance  theorem  for  i.i.d.  random  variables.  The  first  statement 
involves  showing  finite-dimensional  convergence,  by  looking  at  Dirichlet  distri¬ 
butions,  and  proving  .tightness ,  which  follows  by  the  proof  of  Billingsley's  (1968) 
Theorem  15.6,  upon  noticing  that 

(n+a+l) 2  E  {F(s,  t]  -  FgU,  t]}2  {F(t,  u]  -  FgCt,  u]}2|f 

<  Fb(s,  u]2,  s  £  t  £  u.  □ 

—  B 

.  ^  ^ 

Thus  the  conditional  distributions  9(F)  F  and  ©(F^  )|F  will  be  close  to 

B 

each  other  for  well-behaved  functionals,  justifying  the  BB  method.  Particular 
examples  can  be  worked  through,  as  in  Bickel  and  Freedman  (1981).  Their 
tentative  description  of  well-behavedness  (p .  1209)  can  also  be  subscribed  to 
here.  Sufficient  conditions  for 

(n+a+1)*5  {0(F)  -  9(Fg)}|F  |  N(0,  a2)  a.s., 

(n+a+l) **  {6(Fb*>  ~  9(Fb)>|f  *+b  N(0,  a2)  a.s. 
to  hold,  for  appropriate  variance  (J2,  can  be  written  down,  using  von  Mises 
methods,  for  example  as  in  Boos  and  Serf  ling  (1980)  ,  Parr  (1985)  ,  who  use 
Frechet  differentiability,  or  the  more  universally  applicable  machinery  of 
Hadamard  or  compact  differentiability,  as  in  Reeds  (1976)  and  Fernholz  (1983). 

Remark  4.  If  a  is  fixed,  or  only  a/v^n  0,  then 

{F(. )  -  F(.)}|F  i  w°^Ftrue(-)}  (3.11) 

£  {?„*(.)  -  F( .)} |f  W°<Ftrue(.)}  a.s.  (3.12) 

A  conclusion  concerning  the  approximate  agreement  among*  the  four  statisti¬ 
cians  referred  to  in  the  previous  remark,  thus  can  be  reached  also  for  X  =  R 
(and  for  more  general  spaces) . 
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Remark  5.  The  functional  0  =  0(F)  can  depend  upon  n;  the  described  BB 
procedure  works  specifically  for  the  given  n.  0  is  also  allowed  to  depend  upon 
the  actual  data  sample,  say  0  =  0(F,  X^,...,Xn)  (in  contrast  to  simpler  ones 
like  0(F)  =  /xdF(x) ,  a(F)  =  [/{x  -  0(F) }2dF(x) ]  2  ).  An  example  is  0  =  supt  | F ( t ) 

-  F(t)  |  ,  of  importance  in  connection  with  confidence  bands.  Our  BB  method  is 
general  enough  to  handle  such  cases  too. 

Let  us  illustrate  this  comment  with  a  description  of  how  a  nonparametric 
Bayesian  might  construct  a  simultaneous  confidence  band  for  F.  Consider  0  = 
supa<t<b  i F (t)  -  Fg(t) | / [Fg(t)  {1  -  Fg(t) }]^.  The  natural  band  is 

VC)  "  tl-aC^B(t)  {1  '  Vt)}]^-  F(t)  -  FB(t)  +  tl-a[FB(t)  {1  "  Vt)}]?2 

for  a  <  t  <  b,  where  t.  ideally  would  be  determined  by  Pr  {0(F,  X_,...,X  ) 

—  —  1  — ot  B  x  n 

<  t-  If}  =  1  -  a.  This  t,  would  be  almost  impossible  to  find.  The  BB  method 

—  1-a1  1-a 

consists  of  generating  perhaps  1000  values  0^  of  0^  =  su^a<t<b  '^B  ”  ^(t) 

/[LCt)  {1  -  YAt)}]h,  and  using  t-  =  empirical  upper  a-point  for  these  1000 
B  B  i-a 

tc  A 

realisations  instead  of  t-  .  One  may  prove  that  (n+a+1)  (t-  ^  -  t-  )  +  0  a.s. 

1-a  J  1-a  1-a 

in  this  case.  (Strictly  speaking,  this  is  true  provided  BOOT^  realisations 
are  generated  instead  of  1000  and  BOOT^/Cn  log  n)  grows  to  infinity.) 

Remark  6.  The  bootstrap  sample  size  BOOT  in  (2.5)  should  of  course  be 
large  in  order  for  GB00T~*(l-a)  *  SoOT*1^  t0  come  close  to  ®  1  (!-«)»  G  1(a). 

The  investigation  of  Efron  (1985,  Section  8)  indicates  that  BOOT  =  1000  may  be 
a  rough  minimum. 

Remark  7.  The  exposition  of  the  BB  method  has  so  far  emphasized  its  use 
to  construct  confidence  intervals.  There  are  other  uses  for  (an  approximation 
to)  the  posterior  distribution,  however,  a  major  example  being  the  evaluation 
of  the  usual  Bayes  estimate  (under  quadratic  loss),  0^  =  JtdG(t).  Closed  form 
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solutions  are  only  available  for  special  cases.  An  approximation  is  now  possible, 

.  boot  ^ 

eB  4  4  B05-T  *  ®B  • 

D=  1 

BOOT  need  in  this  case  not  be  as  large  as  1000  to  make  the  second  approximation 
a  good  one,  BOOT  =  100  may  be  sufficient,  cf.  Efron  (1985,  Section  8). 

Remark  8.  The  starting  point  for  our  quest  for  the  construction  of 
confidence  intervals  has  been  (1.6).  Sometimes  highest  posterior  regions  are 
advocated  instead,  see  for  example  Box  and  Tiao  (1973).  In  the  present  case 
this  would  involve  approximating  the  posterior  distribution  G  with  one  with  a 
density  g(t),  and  then  letting  {t:  g(t)  g^}  be  the  confidence  region,  for 
appropriate  level  g^.  This  approach  makes  most  sense  when  g  is  unimodal,  which 
it  would  not  be  in  many  important  cases  here,  due  to  the  fact  that  the  posterior 
distribution  of  F  places  extra  weight  on  the  observed  data  points.  This  is 
illustrated  in  Section  5  for  the  case  of  the  median. 

Remark  9.  The  BB  method  is  easily  generalised  to  for  example  two  sample 

situations.  To  illustrate,  let  X- , . . .  ,X  and  Y_,...,Y  be  samples  from  respectively 

1  n  1  m 

and  F^»  and  assume  0(F^,  F^)  =  F^  -  F2  ^(%)  is  of  interest.  A  Bayes 

estimate  and  a  confidence  interval  for  this  difference  of  population  medians 

can  be  obtained  by  generating  perhaps  1000  realisations  of  0^  =  median  {X^ 

£  £  £  £  A 
X  .  }  -  median  {Y-  , . . .  ,Y where  the  X.  's  are  drawn  from  (aF,  +  nF1) 

n+a+1  1  *  nrfb+1  i  l,o  1 

/(a+n)  and  the  Y.  's  from  (bF0  +  mF.)/(b+m),  and  then  treating  the  resulting 
i  2  ,o  2 

histogram  as  the  posterior  distribution  of  6. 

Remark  10.  It  is  perhaps  surprising  that  a  simple  method  like  the  BB, 
constructed  merely  to  make  the  first  and  second  moments  of  the  exact  and  the 
approximate  distributions  of  F{A}  agree  for  each  A  C  X  can  work  well  for  the  vast 
majority  of  parameter  functionals.  As  indicated  in  Section  3,  this  is  at  least 
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partly  the  work  and  the  magic  of  the  central  limit  theorem.  This  also  points 
to  the  possibility -of  using  "small-sample  asymptotics"  machinery  to  arrive  at 
other  approximations  to  the  posterior  distribution  G,  for  example  Edgeworth- 
Cramdr  expansions  combined  with  Taylor  expansions.  Such  an  approach  would  be 
functional-dependent,  however;  a  primary  virtue  of  the  BB  is  that  it  is  both 
simple  and  versatile.  A  similar  remark  of  course  applies  to  the  usual  bootstrap. 

Remark  11.  The  BB  has  been  constructed  for  situations  where  the  statistician 
is  willing  to  approximate  the  prior  uncertainty  about  the  unknown  F  with  a 
Dirichlet  process,  involving  only  the  prior  guess  Fq  and  the  "prior  sample  size" 
parameter*  a.  One  can  conceivably  construct  similar  bootstrap-like  devices  for 
more  complex  prior  distributions,  like  mixtures  of  Dirichlet  processes  and  neutral 
to  the  right  processes.  This  would  be  valuable,  considering  the  difficulty  with  which 
even  Bayes  point  estimates  are  evaluated  in'  such  situations. 

Remark  12.  A  critisism  sometimes  voiced  against  the  ordinary  nonparametric 
bootstrap  is  that  it  too  rigidly  sticks  to  the  observed  data  points.  The  Bayesian 
bootstrap  proposed  in  this  paper  provides  a  generalisation  of  the  ordinary  boot¬ 
strap,  as  indicated  in  Remarks  1  and  2,  towards  having  the  possibility  of 
smoothing  also  outside  the  data,  in  a  reasonable  and  non-ad  hoc  way  (unless  one 
discards  Bayesian  statistics  in  general  as  being  too  ad  hoc).  Let  us  point  out 
that  hybrids  can  be  invented.  One  may  take  the  "strength  of  belief"  constant  a, 
which  in  principle  is  user-defined,  to  be  just  an  unknown  parameter  instead, 
and  estimate  it  based  on  the  data.  A  large  estimated  a  should  result  if  the  data 
fits  F^  to  a  high  degree;  if  the  data  seriously  contradicts  Fq  then  the  estimated 
a  should  be  close  to  zero.  A  further  de-Bayesification  of  the  Bayesian  bootstrap 

could  allow  unknown  parameters  in  F  too,  placing  matters  in  an  empirical  Bayes 

o 

framework,  and  estimate  these  too  from  the  data. 
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4.  A  bias  corrected  BB  percentile  Interval. 

The  BB  percentile  interval  cannot  be  bias-corrected  the  way  this  is  done  for 
the  ordinary  bootstrap  case,  as  presented  e.g.  in  Efron  (1982,  Ch.  10).  In  fact, 
one  can  argue  that  the  "bias11  is  already  taken  care  of.  If  there  exists  a 
smooth  increasing  transformation  h  such  that 

h{6(F)}  -  h(0obs)  ~  N(zoo,  o2),  <4.1) 

h{e(V>)  -  »«obs)  ,~B  °2)  <4-2) 

A  A 

for  some  constants  zq,  a,  putting  0O^S  =  0(F^)  for  s^ort  (these  assumptions 

appear  reasonable  in  view  of  the  preceding  section),  then  one  may  deduce 

z  =  -  $  ^{G(0  )}  as  in  the  cited  reference,  but  the  bias  corrected  interval 

o  obs 

on  the  h(0)  scale, 

h(0  )  +  z  a  -  z(1~°^a  <  h(0)  <  h(0  ,  )  +  z  a  + 

obs  O  ’ —  —  ODS  o 

where  =  $  ^(l-a)  is  the  upper  a-point  for  the  standard  normal,  happens 

to  tranform  back  again  to  (2.6)  again  on  the  6  scale: 

G[h_1{h(0  .  )  +  z  a  +  z^^a}] 

ODS  O 

-  Pr*,B(h<5B*)  i  h(§obS>  +  V  +  za^>0j 

=  Pr{N(z  ,  1)  <  z  +  z^"°^}  =  1  -  a. 
o  —  o 

Even  if  the  above  approach  had  led  to  something  non-trivial  the  result 
would  not  have  been  as  trustworthy  as  Efron’s  bias  corrected  percentile 
interval  is,  comparatively  speaking.  The  comments  about  skewness  following 
(2.7)  would  imply  that  the  implicit  assumption  in  (4.1),  (4.2),  namely 
that  z^  can  be  taken  to  be  the  same  in  the  two  situations,  hardly  could  be 
trusted,  this  in  contrast  to  the  ordinary  bootstrap  framework  in  which  it 
is  known  that  the  bootstrap  approximation  usually  is  good  also  to  the  "next 
order".  It  may  however  be  possible  to  deduce  a  simple  and  likely  relationship 
between  the  two  zQfs  l11  (the  rephrased)  (4.1),  (4.2),  and  then  correct  the 
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BB  percentile  interval  based  on  this* 


There  is  another  possibility  of  detecting  and  repairing  a  bias,  however. 
For  each  in  a  respectable  catalogue  of  examples  there  is  a  known  transformation 
h,  perhaps  the  identity,  such  that  the  posterior  expectation  of  h{0(F)}  is 
explicitly  calculable  by  some  published  formula,  i.e. 

V  =  E  [h{0(F) } | F] 

O  D 

is  known.  The  BB  procedure  considers 

H(t)  =  Pr*  B[h{0(FB*)}  <  t | F>  =  G{h_1(t)} 
and  approximates  with 

1  B00T  /v  *v 

VQ  =  /tdH(t)  =  ^  Z  h{0(FB  )}  =  Vo  +  e,  (4.3) 

b=l 

A 

say.  Accordingly,  if  £  +  0,  then  H  is  not  a  perfect  estimate  of  H,  the  c.d.f. 
of  h{0(F) } | F.  H£(t)  =  H(t  +  e)  is  a  new  estimate,  this  time  getting  the  mean 


right .  Hence 

H  ~1(a)  =  H~1(a)  -  e  <  h{0(F)>  <  H_1(l  -  a)  -  e  =  H  -1(1  -  a) 

W  '  b 

would  be  a  natural  corrected  confidence  interval  for  h{0(F)}.  Transforming  back 
we  obtain 

h“1[h{G"1(a)}  -  e]  <  0(F)  <  h-1[h{G_1(l  -  a)}  -  e]  (4.4) 


as  the  bias  corrected  BB  percentile  interval  for  0(F).  Of  course  this  interval 
is  just  (2.6)  if  e  *  0  above. 

One  can  also  write  down  a  slightly  more  general  bias  and  variance  corrected 
BB  percentile  interval  which  also  takes  into  account  the  value  of  Tq  = 


Var  [h{0(F)}|F]  if  it  is  available.  Assume  that,  in  addition  to  (4.3), 


T  2  -  /(t  -  V  )  zdH(t)  * 
o  o 


.  BOOT  A 

555?  *  ^  >  -  V2  '  V<i  + 

b=l 
-1 


A  perhaps  better  estimate  of  G{h  (t) }  is  then  H£  gCb)  =  H{t(l+5)  +  £  -  ^*5}, 


-1. 


since  it  gets  both  the  mean  and  the  variance  right.  Using  H£  ^  (p)  = 
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/N— X 

{H  (p)  +  V  5  -  e}/(l  +  6)  one  ends  up  with 

+  vi4,  < e(F)  < +  ^4,. 


1  +  5J  - 


1  +  6J 


(4.5) 


5.  Some  examples . 

This  section  looks  briefly  into  the  nature  of  the  BB  approximation  method, 
and  compares  confidence  intervals  arising  from  different  prior  distributions 
in  an  artificial  example. 


5.1.  A  probability. 

If  6(F)  =  P(A)  for  some  set  A  of  interest,  then 


0(F)  If  ~  Beta{aF  (A)  +  #(x.  e  A) ,  aF  (AC)  +  #(x  i  A)}. 

Bo  1  o  i 

Thus  (1.5)  and  (1.6)  can  be  obtained  from  tables  of  the  incomplete  Beta  function. 
In  this  case  the  BB  method  amounts  to  approximating  the  Beta  distribution  G 
above  with  that  of  Y/(n+a+l),  Y  being  binomial  [n+a+1,  {aF^(A)  +  //(x^  e  A)}/(a+n)] 
If  U  is  Beta  {mp,  m(l-p)}'  and  V  is  Bin  {m+l,p}/(mfl) ,  then  EU  =  EV  *  p, 

Var  U  =  Var  V  *  p(l-p).  U  and  V  differ  in  skewness  and  kurtosis,  but  not 

to  any  dramatic  extent: 


skew  (U)  =  2 


(mfl)  l-2p 


{p(i-p)>% 


skew  (V)  = 


l-2p 


(m+1)'5  {p(l-p) 


kurtosis  (U)  = 
kurtosis  (V)  = 


6m 


{  ,1+1/m  } 

1--L  _  /-i  _\  S  9 


(nri-2)  (m+3)  "  p(l-p) 

1  1  -  4p(l-p) 

m+1  p(l-p) 


Brief  investigations  have  shown  the  distributions  of  U  and  V,  and  there¬ 
fore  confidence  intervals  based  on  either  the  exact  or  BB  approximated  distri¬ 
butions,  to  be  remarkably  similar  even  for  moderate  m,  provided  p  is  not  too 
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close  to  zero  or  one,  provided  ot  is  not  too  close  to  zero,  and  finally 
provided  the  discrete  distribution  of  V  is  interpolated.  Rather  than  using 
G(t)  =  Pr[Bin  {m4-l,p}/(m+l)  t],  which  jumps  at  the  points  j/(m+l),  we 

use  G{j/(m+l)}  =  3^Pr{V  <_  j/(m+l)}  +  h^r{V  <_  (j-l)/(m+l)}  (which  is  what  one 
gets  if  one  interprets  Pr{V  <_  j/(m+l)}  as  Pr{V  +  e  <_  j/(m+l)}  where  e  is 
independent  of  V  and  say  normal  (0,  10  )  and  interpolate  linearly  in- 

between.  Similar  modifications  to  G  are  advocated  in  other  cases  where  G 
increases  in  sharp  jumps  only. 


5.2.  The  median. 

The  p-quantile  functional  is  another  example  where  it  is  possible  to  compute 
the  posterior  distribution  explicitly,  but  the  resulting  expressions  are 
complex,  and  the  BB  would  be  much  easier  to  carry  out  in  practice.  For  simpli¬ 
city  only  the  median  0(F)  =  F  =  inf  { t :  F(t)  ^  h)  is  considered  below. 

Assume  for  concreteness  that  the  data  points  are  distinct,  say  x^  <  ... 

<  x  .  We  shall  find  G(t)  =  Pr_{0(F)  <  t|F}. 
n  d  — 

First  look  at  a  data  point  x^.  Then 
G{x. }  =  Pr  {0(F)  =  x.  |F> 

J  J 

=  Pr_{F(-®,x  )  <  hy  F(-®,x  ]  _>  Js|F} 

D  3  J 

=  Pr{U  <  h,  u  +  V  _>  h}  =  Pr{U  <  W  <  %}, 


where  (U,  V,  W)  is  Dirichlet  (a,  g,  y) ;  a  =  aF^Cx^-)  +  j  -  1,  g  =  aFQ{x^.}  +  1, 

y  =  aF  (x.,»)  +  n  -  j.  Assuming  the  prior  guess  c.d.f.  to  be  continuous  one  gets 
o  J 


,  ,  =  r(q  +  g  +  y)  1  1  (i.y 

G{V  r(a)r(g)r(y)  a  y 

_ r(a  +  n) _ 

=  r(aFo(xj)+j)r(a{l-Fo(xj) }+n-j+l) 


(h) 


a+n-1 


(5.1) 


Next  consider  G[t,  t+dt]  for  some  t  outside  {x^,...,x^},  and  let  for  further 


convenience  F  be  the  integral  of  a  prior  guess  density  f  .  Following  the 
o  o 
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reasoning  above  one  may  show  that  G  has  a  density  at  t  e  (x^  ,  x^+^)  equal  to 

s(t)  ■  ntfTD^frl.U-F  (tmM)  afo(t)  JIaF0(t)  +  i-  all-Fo(t))  +  "-J1. 
o  o 

(5.2) 


where 


'i 


J[ot,  y]  =  /  /  ua  ^  w^  ^  (1-u-w)  ^  du  dw 
0  0 

OO  UJ 

=  ( h)a+y  z  z  (“)  ^)m^-rTvri — i  • 

m=0  i=0  1  a  +  i  Y  +  m  -  i 

It  is  (in  principle)  possible  to  compute  for  example  the  posterior  expectation 
and  the  lower  and  upper  5  percent  points  for  the  distribution  G  based  on  the 
results  above. 

Now  consider  the  BB  approximation  method  in  this  situation.  It  approximates 


the  complicated  G  above  with 

G(t)  =  Pr^B(0B*  =  0(Fb*)  <  t} 

*  * 


=  Pr^B(median{X1  . . xn+a+i}  1 


(5.3) 


where  the  X.  's  are  i.i.d.  from  F„  =  (aF  +  nF)/(a  +  n) .  Assume  for  simplicity 
i  Bo 

that  n+a+1  is  an  odd  integer,  say  2m  +  1.  Then 

* 


G(t)  -  Pr*>BfX(]tfl)  I  t> 

=  Pr[Bin{2m+l,  F  (t) }  >  m  +  1] . 
,  B 


(5.4) 


Expressions  for  G{x^}  and  for  the  density  g  that  G  has  between  data  points 
can  be  worked  out  based  on  this,  and  they  can  be  compared  with  G  and  g  obtained 
above.  Such  a  study  is  not  pursued  here.  Notice  that  the  endpoints  of  the 
BB  confidence  interval  can  be  found  using  binomial  tables. 

Note  also  that  in  the  non- informative  case,  where  a  tends  to  zero, 

A 

both  G  and  G  are  supported  on  the  data  points,  with 
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G(x .  }  =  (n'h  (V)n'\ 
J  J-l 


au.>-  (ii)  *  i  (ari,  *■. 

j  ("in)  !  (T2n)  !  n  n  n 


5,3.  The  endpoint  of  a.  distribution. 

As  the  final  explicit  example,  consider 
0(F)  =  sup  { t :  F(t)  <  1}. 

The  exact  posterior  distribution  of  0  based  on  the  Dirichlet  process  assumption 

is  particularly  simple  in  this  case.  One  has  0(F)  £  t  if  and  only  if  F(t)  =  1, 

and  F(t)  is  Beta  (aF  (t)  +  nF(t),  a{l  -  F  (t)  }  +  nF(t,»)}.  The  probability  of 

o  o 

a  Beta  {a,  0}  variable  being  1  is  zero  unless  0  =  0,  in  which  case  the  probability 

a 

suddenly  is  one.  Thus  G(t)  is  zero  if  Fq  or  F  have  some  mass  left  for  (t,00), 

and  is  one  only  if  F  (t)  =  1  and  no  x. ’  s  are  greater  than  t,  i.e.  G  is  simply 

o  i 

concentrated  at  the  single  point  max  {0(Fq),  nax^^  x^}. 


^  ^  A  ^ 

Now  let  us  watch  BB  at  work.  0_  =  0(F_  )  =  max  {X-  , 

D  D  X 


•Xn+a+l1  has 


distribution 


G(t)  =  Pr*  every  X±  <_  t} 

-  FB(t)n+a+1. 

If  the  right  endpoint  of  the  prior  guess  is  00 ,  then  G{°°}  =  1,  i.e.  the  a  posteriori 

opinion  is  that  the  right  endpoint  of  the  unknown  F  is  indeed  00 .  This  can  be 

n+a+1 

compared  to  G(t)  =  { — ; —  F  (t)  + — ; — }  for  t  >  max. ^  x. .  If  on  the  other 

r  a+n  o  a+n  —  i<n  i 

hand  the  right  endpoint  of  the  prior  guess  is  finite,  say  0(Fq)  =  100,  then  the 
exact  a  posteriori  distribution  is  concentrated  at  the  sensible  point  max{100, 
max^<n  x_^},  and  the  BB  approximation  does  not  seriously  disagree: 


**>-  {±  Fo(t)  +  ^ 


n+a+1 

,  if  max..  x.  <  t  <  100, 
a+n  *  i  <n  i  — 

,  if  t  >  100  and  max.  x  <  100, 
*  —  i<n  i  —  * 


n+a+1 


,  if  100  <  t  <  max.  .  x  , 

*  —  i<n  i 

,  if  100  <  max . _  x  <  t . 

*  —  i<n  i  — 
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5.4  An  artificial  example. 


The  following  example  is  indeed  constructed  but  hopefully  illustrative. 

Assume  that  the  abilities  and  intelligence  of  a  certain  interesting  minority 
population  is  studied,  and  assume  that  20  individuals  from  this  population 
are  given  a  standard  IQ  test.  Let  us  suppose  that  the  resulting  X^, . . . 9%20 
really  come  from  a  normal  (125,  102)  distribution;  the  data  points  x^,...,X2q 
used  below  were  simulated  from  this  distribution.  Some  parameters  of  interest 
could  be 

0(F)  =  mean  of  F  =  /xdF(x) , 

a(F)  =  standard  deviation  of  F  =  [/{x  -  0(F) }2dF(x) 

y(F)  =  upper  25  percent  point  of  F  =  F  ^(3/4). 

We  include  four  different  prior  distributions  in  this  modest  experiment, 

namely  (i)  Fq  ■  normal  (100,  152)  (from  which  our  own  IQs  presumably  once 

were  drawn),  a  =  2;  (ii)  F  =  normal  (100,  152),  a  =  6;  (iii)  F  =  normal 

o  o 

(100,  152),  a  =  10;  (iv)  Fq  ®  uniform  on  [100,  150],  a  =  6.  The  data  points 
turned  out  to  be  94.8,  114.7,  115.5,  117.5,  119.1,  121.2,  122.6,  124.0,  124.6, 
128.7,  130.0,  130.5,  130.8,  131.4,  132.2,  133.5,  134.4,  135.3,  136.4,  144.6. 
The  empirical  mean  is  126.07  and  the  empirical  standard  deviation  is  10.70. 

a  * 

The  mean.  Histograms  of  BOOT  =  1000  BB  values  of  0^  are  shown  in  Figure 
1  (i) ,  (ii),  (iii),  (iv),  corresponding  to  the  four  combinations  of  (a,  F  ) 

listed  above.  For  example,  the  1000  values  leading  to  Figure  1  (i)  are  of  the 

a  ^  *  *  *  *  * 
type  ■  0(Ffi  )  =  mean  {X^  }  where  X^  ,  ...^g  are  i.i.d.  from 

2  20  * 

-T7T  F  +  F.  The  distributions  are  unimodal  and  fairly  symmetric,  and  for 
22  O  22 

A  *  -31 

all  practical  purposes  continuous  (0  has  microscopic  point  masses  1.33  10 

B 

in  each  of  the  20  data  points  in  Experiment  (i),  for  instance). 
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Table  1  lists  95  percent,  90  percent,  and  80  percent  confidence  intervals 
for  the  unknown  mean  9(F),  reached  by  Bayesians  (i),  (ii),  (iii),  (iv) .  Also 
listed  is  the  median  of  the  BB  posterior  distribution,  which  is  also  a  good 
approximation  to  the  Bayesian  point  estimate  using  absolute  loss.  Note  that 
Bayesian  (i),  with  a  =  2,  behaves  almost  like  the  standard  bootstrap  user. 


TABLE  1 


Confidence  intervals  for  the  mean,  reached  by  Bayesians  (i) ,  (ii) ,  (iii),  (iv). 


95  percent 
90  percent 
80  percent 
median 


Bayesian  (i) 
[117.74,  129.02] 
[118.71,  128.19] 
[119.98,  127.12] 
123.68 


(ii) 

[113.33,  125.60] 
[114.22,  124.93] 
[115.69,  123.90] 
120.07 


(iii) 

[110.53,  123.10] 
[111.55,  122.13] 
[113.10,  121.24] 
117.37 


(iv) 

[121.41,  130.01] 
[121,94,  129.33] 
[122.83,  128.69] 
125.74 


The  mean  functional  is  simple  enough  to  compute  moments  of  its  true 
posterior  distribution.  Ferguson  (1973)  has  shown 


Eg{9(F) |data}  -  a+n 


e(F  )  + 

o  a+n 


e(F), 


(5.5) 


and  methods  from  the  same  paper  can  be  used  to  obtain 
Var_{0(F) | data }  =  1  r  a 


^  Hrr  cx2(F  )  +  -rr  a2(F)  +  -f- {0(F  )  -  0(F)  }2] . 
n+a+1  a+n  o  a+n  a+n  a+n  o 

(5.6) 


A  _  n  „  1  n  _  o 

(Here  0(F)  =  x  =  Z^jX^/n  and  oz(F)  =  —  Z^^x^.  -  x)z.)  Hence  we  are  in  a  position 

to  correct  the  BB  intervals  of  Table  1  for  bias  and  variance,  using  (4.4)  and  (4.5). 

These  changes  turn  out  to  be  small,  and  perhaps  insignificant  compared  to 

the  variation  resulting  from  differences  in  prior  opinion  from  Bayesian  to 

Bayesian.  As  a  random  example,  consider  90  percent  intervals  for  0  in  Experiment  (ii). 
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The  uncorrected  BB  percentile  interval  is  [G  ^(.05),  G  ^(.95)]  =  [114.22, 
124.93]  as  in  Table  1.  (5.5)  above  gives  |data}  =  100  +  x 

=  120.0573,  whereas  the  BB  approximation  gives  v  =  E.  {0  }  =  (1/1000)  2^?^ 

O  * ,  D  D  b—  ± 

JLT 

0  =  119.8952,  i.e.  e  =  -  .1621,  cf.  (4.3).  The  bias  corrected  BB  interval 

D 

is  accordingly  [114.06,  124.77].  Next,  x  2  =  Var_{0(F) Idata}  =  9.6501  using 

O  D 

(5.6),  whereas  x  2  =  10.2493,  i.e.  1  +  6  =  1.0306  in  the  notation  of  Section  4. 
o 

The  bias  and  variance  corrected  interval  (4.5)  becomes  [114.55,  124.94]. 

-  * 


The  standard  deviation.  Histograms  of  BOOT  =  1000  values  of  c„  = 

'  D 

standard  deviation  of  >  •  •  •  »X20+a+l  are  s^own  in  F^-Sure  2  (i)  ,  (ii),  (iii), 
(iv) ,  corresponding  again  to  the  four  experiments.  The  distributions  are 
again  unimodal  and  practically  continuous,  but  not  symmetric. 

The  bias  correction  procedure  of  Section  4  turns  out  to  be  more  important 
for  this  functional  than  it  was  for  the  mean.  One  may  prove,  again  using 


methods  of  Ferguson  (1973)  ,  that 


yo^F)  [data)  =  ^  [—  o  HV0)  +  pj  o2(F)  +  55  55-'  WtJ  -  9(F)  >2] 


cf.  (5.6).  The  1  -  2a  bias  corrected  confidence  interval  for  a(F)  is  therefore 
{G-*(a)2  -  eK5  <  o(F)  <  {G-1(l-a)2  -  e}^,  (5.7) 


A  A  A 

where  e  is  the  difference  between  the  average  value  of  the  observed  (cr  ) z  ’s 
and  Eg{a2(F)  (data),  as  in  (4.3).  Notice  the  similarity  between  Efi{a2(F) |data} 
and  Varfi{6(F) jdata). 

Table  2  gives  confidence  intervals  for  a(F)  in  the  four  experiments, 
both  the  uncorrected  BB  and  the  bias  corrected  BB  interval  (5.7).  Also  listed 
are  the  posterior  medians,  to  be  thought  of  as  Bayes  point  estimates  of  o(F) 
(under  absolute  loss) . 


> 
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TABLE  2 


Confidence  intervals  for  the  standard  deviation  in  Experiments  (1),  (ii) ,  (iii), 
(iv) ;  usual  BB  interval  (upper  line)  and  bias  corrected  BB  interval  (lower  line) . 


Bayesian  (i) 

(ii) 

(iii) 

(iv) 

95 

percent 

[7.82, 

19.18] 

[10.94, 

21.00] 

[13.02, 

21.44] 

[7.57, 

14.85] 

[7.10, 

18.90] 

[10.20, 

20.62] 

[12.43, 

21.09] 

[7.02, 

14.58] 

90 

percent 

[8.33, 

17.90] 

[11.83, 

20.31] 

[13.71, 

20.77] 

[8.20, 

14.47] 

[7.65, 

17.60] 

[11.15, 

19.92] 

[13.15, 

20.40] 

[7.69, 

14.19] 

80 

percent 

[9.45, 

16.77] 

[12.61, 

19.27] 

[14.48, 

20.20] 

[9.03, 

13.79] 

[8.86, 

16.44] 

[11.96, 

18.86] 

[13.95, 

19.82] 

[8.57, 

13.49] 

median 

13 

.02 

15 

.92 

17 

.34 

11 

.57 

It  is  also  possible  to  correct  the  intervals  further  for  possible 
inaccuracy  of  the  BB  approximation  to  the  variance  of  a2(F)  given  data;  this 
would  involve  a  quite  lengthy  formula  for  E^o^CF)  |data}  -  [Eg{a2(F) |data}]2, 
however,  and  is  not  pursued  here. 

The  upper  quart ile.  The  histograms  in  Figure  3  (i) ,  (ii) ,  (iii),  (iv) 

A  *  a  *  _1 

come  from  BOOT  =  1000  values  of  y_  =  y(F_  )  (3/4) .  The  sample  sizes  of 

D  D 

A  * 

the  BB  samples  in  question  are  23,  27,  31,  27,  so  we  take  y  to  be  respectively 

D 

order  statistic  18,  order  statistic  21,  order  statistic  24,  and  order  statistic 
21.  The  distribution  of  is  different  from  that  of  8„  and  on  in  that  it 

D  DO 

has  most  of  its  mass  concentrated  on  the  observed  data  points,  cf.  theoretical 
calculations  for  the  median  in  5.2. 

Table  3  lists  95,  90,  and  80  percent  BB  confidence  intervals  for  y(F) 
for  the  four  choices  of  prior  distribution  of  F  in  its  space  of  all  distributions. 
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TABLE  3 


Confidence  intervals  for  the  upper  quartile  in  Experiments  (i),  (ii),  (iii),  (iv)» 


95  percent 

[130.0, 

135.3] 

[126.0, 

135.3] 

[12A.0, 

13A.A] 

[130.5, 

136. A] 

90  percent 

[130.5, 

135.3] 

[128.7, 

13A.A] 

[ 12A . 9  , 

13A.A] 

[130.8, 

136. A] 

80  percent 

[130.8, 

13A.A] 

[130.0, 

13A.A] 

[128.7, 

133.5] 

[131. A, 

135. A] 

med ian 

132 

.2 

131 

..A 

13C 

).8 

133 

.5 
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